Brian S. Evans, Ph.D.
Migratory Bird Center
Smithsonian Conservation Biology Institute
# Load RCurl library:
library(RCurl)
# Load a source script:
script <-
getURL(
"https://raw.githubusercontent.com/bsevansunc/workshop_languageOfR/master/sourceCode.R"
)
# Evaluate then remove the source script:
eval(parse(text = script))
rm(script)Why would you use for loops?
# Filter irisTbl to setosa:
irisTbl[irisTbl$species == 'setosa', ]
# Extract the petalLength field (column):
irisTbl[irisTbl$species == 'setosa', ]$petalLength
# Calculate the mean of petal lengths:
mean(irisTbl[irisTbl$species == 'setosa', ]$petalLength)Calculate the mean petal length of each of the Iris species using matrix notation (as above) and a custom function.
Calculate the mean petal length of each of the Iris species using matrix notation (as above) and a custom function.
# Mean petal lengths, matrix notation:
mean(irisTbl[irisTbl$species == 'setosa', ]$petalLength)
mean(irisTbl[irisTbl$species == 'versicolor', ]$petalLength)
mean(irisTbl[irisTbl$species == 'virginica', ]$petalLength)
# Mean petal lengths, function method:
meanPetalFun <- function(spp){
mean(irisTbl[irisTbl$species == spp, ]$petalLength)
}
meanPetalFun('setosa')
meanPetalFun('versicolor')
meanPetalFun('virginica')
Consider the following numeric vector, v:
| [1] | [2] | [3] | [4] | [5] |
|---|---|---|---|---|
| 1 | 1 | 2 | 3 | 5 |
| [1] | [2] | [3] | [4] | [5] |
|---|---|---|---|---|
| 1 | 1 | 2 | 3 | 5 |
Vector v is an R object comprised of five numbers.
# Explore vector v:
v
class(v)
str(v)
length(v)| [1] | [2] | [3] | [4] | [5] |
|---|---|---|---|---|
| 1 | 1 | 2 | 3 | 5 |
Each value in a vector has a position, denoted by “[i]”.
Recall: v[i] is the value of v at position i.
# Explore vector v using indexing:
i <- 3
v[i]
v[3]
v[3] == v[i]Each value in a vector has a position, denoted by “[i]”.
Recall: v[i] is the value of v at position i.
# Add 1 to the value of v at position three:
i <- 3
v[3] + 1
v[i] + 1Writing proper for loops requires following these three steps:
ALWAYS specify an object to store your output!
Vector objects are defined as:
# Define a vector for output:
vNew <- vector('numeric', length = length(v))
str(vNew)ALWAYS specify an object to store your output!
# Explore filling values of vNew by index:
i <- 3
v[i]
vNew[i] <- v[i] + 1
vNew[i]
v[i] + 1 == vNew[i]The sequence can be defined as:
v
1:5
1:length(v)
seq_along(v)
# Example for loop sequence statements:
# for(i in 1:length(v))
# for(i in seq_along(v))The for loop body describes what will happen at each iteration of the loop. For example:
i <- 3
vNew[i] <- v[i] + 1# For loop output:
vNew <- numeric(length = length(v))
# For loop sequence:
for(i in seq_along(v)){
# For loop body:
vNew[i] <- v[i] + 1
}
# Explore first for loop output:
vNew
v
vNew == v + 1Split-Apply-Combine
# Mean petal lengths of Iris species without a for loop:
mean(irisTbl[irisTbl$species == 'setosa', ]$petalLength)
mean(irisTbl[irisTbl$species == 'versicolor', ]$petalLength)
mean(irisTbl[irisTbl$species == 'virginica', ]$petalLength)Split-Apply-Combine
Start by creating a vector of species:
# Make a vector of species to loop across:
irisSpecies <- levels(irisTbl$species)
irisSpeciesSplit-Apply-Combine
Create an empty vector to store our output:
# For loop output statement:
petalLengths <- vector('numeric',length = length(irisSpecies))
petalLengthsSplit-Apply-Combine
Split: The for loop body, starts with splitting the data
# Exploring the iris data, subsetting by species:
i <- 3
irisSpecies[i]
irisTbl[irisTbl$species == irisSpecies[i], ]
# Split:
iris_sppSubset <- irisTbl[irisTbl$species == irisSpecies[i], ]Split-Apply-Combine
Apply: Modification of the data:
# Calculate mean petal length of each subset:
mean(iris_sppSubset$petalLength)Split-Apply-Combine
# Make a vector of species to loop across:
irisSpecies <- levels(irisTbl$species)
# For loop output statement:
petalLengths <- vector('numeric',length = length(irisSpecies))
# For loop:
for(i in seq_along(irisSpecies)){
# Split:
iris_sppSubset <- irisTbl[irisTbl$species == irisSpecies[i], ]
# Apply:
petalLengths[i] <- mean(iris_sppSubset$petalLength)
}Split-Apply-Combine
Combine: Combining the for loop output
# Make a tibble data frame of the for loop output:
petalLengthFrame <- data_frame(species = irisSpecies, count = petalLengths)
petalLengthFrame
Use a for loop and the birdHabits data frame to calculate the number species in each diet guild.
Use a for loop and the birdHabits data frame to calculate the number species in each diet guild.
birdHabits
diets <- unique(birdHabits$diet)
outVector <- vector('numeric', length = length(diets))
for(i in seq_along(outVector)){
# Split:
dietSubset <- birdHabits[birdHabits$diet == diets[i],]
# Apply:
outVector[i] <- nrow(dietSubset)
}
# Combine:
data_frame(diet = diets, nSpecies = outVector)For loops can be used to explore data objects with common features.
How many omnivorous birds were observed at each site?
# Explore the bird count data:
head(birdCounts)
str(birdCounts)
# Explore the bird trait data:
head(birdHabits)
str(birdHabits)How many omnivorous birds were observed at each site?
Get a vector of birds that are ground foragers from the birdHabits data frame:
# Extract vector of omnivorous species:
omnivores <- birdHabits[birdHabits$diet == 'omnivore',]$speciesHow many omnivorous birds were observed at each site?
Split the data into individual sites.
# Generate a vector of unique sites:
sites <- unique(birdCounts$site)
# Site at position i:
i <- 3
sites[i]
# Subset data:
birdCounts_siteSubset <- birdCounts[birdCounts$site == sites[i],]
birdCounts_siteSubsetHow many omnivorous birds were observed at each site?
Split: Use %in% to extract only records associated with omnivores and sum the count field.
# Just a vector of omnivore counts:
countVector <-
birdCounts_siteSubset[birdCounts_siteSubset$species %in%
omnivores,]$countHow many omnivorous birds were observed at each site?
Apply: Sum the count vector.
# Get total number of omnivores at the site:
nOmnivores <- sum(countVector)How many omnivorous birds were observed at each site?
Combine: Values combined using the vector method
sites <- unique(birdCounts$site)
outVector <- vector('numeric', length = length(unique(sites)))
for(i in seq_along(sites)){
birdCounts_siteSubset <- birdCounts[birdCounts$site == sites[i],]
countVector <-
birdCounts_siteSubset[birdCounts_siteSubset$species %in%
omnivores, ]$count
outVector[i] <- sum(countVector)
}
# Combine:
data_frame(site = sites, nOmnivores = outVector)How many omnivorous birds were observed at each site?
Combine: Values combined using the list method
sites <- unique(birdCounts$site)
outList <- vector('list', length = length(unique(sites)))
for(i in seq_along(sites)){
birdCounts_siteSubset <- birdCounts[birdCounts$site == sites[i],]
countVector <-
birdCounts_siteSubset[birdCounts_siteSubset$species %in%
omnivores,]$count
outList[[i]] <- data_frame(
site = sites[i],
nOmnivores = sum(countVector))
}
# Combine:
bind_rows(outList)For loop to generate a vector of numbers based on some mathematical function. For example:
\[n_t = 2(n_{t-1})\]
For loop to generate a vector of numbers based on some mathematical function. For example:
\[n_t = 2(n_{t-1})\]
# For loop output:
n <- vector('numeric', length = 5)
n
# Set the seed value:
n[1] <- 10
nFor loop to generate a vector of numbers based on some mathematical function. For example:
\[n_t = 2(n_{t-1})\]
# For loop sequence:
# for(i in 2:length(n))For loop to generate a vector of numbers based on some mathematical function. For example:
\[n_t = 2(n_{t-1})\]
Body: For each iteration (example, position 2):
# Exploring the construction of the for loop body:
i <- 2
n[i]
n[i-1]
n[i] <- 2*n[i-1]
nFor loop to generate a vector of numbers based on some mathematical function. For example:
\[n_t = 2(n_{t-1})\]
# Output:
n <- vector('numeric', length = 5)
# Seed:
n[1] <- 10
# For loop:
for(i in 2:5){
n[i] = n*v[i-1]
}One of my favorite for loops was created by Leonardo Bonacci (Fibonacci). He created the first known population model, from which the famous Fibonacci number series was created. He described a population (N) of rabbits at time t as the sum of the population at the previous time step plus the time step before that:
\[N_t = N_{t-1} + N_{t-2}\]